Phrase Based Language Model for Statistical Machine Translation - empirical study

نویسنده

  • Geliang Chen
چکیده

Reordering is a challenge to machine translation (MT) systems. In MT, the widely used approach is to apply word based language model (LM) which considers the constituent units of a sentence as words. In speech recognition (SR), some phrase based LM have been proposed. However, those LMs are not necessarily suitable or optimal for reordering. We propose two phrase based LMs which considers the constituent units of a sentence as phrases. Experiments show that our phrase based LMs outperform the word based LM with the respect of perplexity and n-best list re-ranking.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

A Comparative Study of English-Persian Translation of Neural Google Translation

Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...

متن کامل

The RWTH Aachen German-English Machine Translation System for WMT 2015

This paper describes the statistical machine translation system developed at RWTH Aachen University for the German→English translation task of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation (WMT 2015). A phrase-based machine translation system was applied and augmented with hierarchical phrase reordering and word class language models. Further, we ran discriminative maximum ex...

متن کامل

An Empirical Study in Source Word Deletion for Phrase-Based Statistical Machine Translation

The treatment of ‘spurious’ words of source language is an important problem but often ignored in the discussion on phrase-based SMT. This paper explains why it is important and why it is not a trivial problem, and proposes three models to handle spurious source words. Experiments show that any source word deletion model can improve a phrase-based system by at least 1.6 BLEU points and the most...

متن کامل

An Empirical Analysis of Source Context Features for Phrase-Based Statistical Machine Translation

Statistical phrase-based machine translation systems make only little use of context information: while the language model takes into account target side context, context information on the source side is typically not integrated into phrase-based translation systems. Translational features such as phrase translation probabilities are learned from phrase-translation pairs extracted from word-al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1501.05203  شماره 

صفحات  -

تاریخ انتشار 2015